Materials informatics approach using domain modelling for exploring structure–property relationships of polymers

In the development of polymer materials, it is an important issue to explore the complex relationships between domain structure and physical properties. In the domain structure analysis of polymer materials, 1H-static solid-state NMR (ssNMR) spectra can provide information on mobile, rigid, and intermediate domains. But estimation of domain structure from its analysis is difficult due to the wide overlap of spectra from multiple domains. Therefore, we have developed a materials informatics approach that combines the domain modeling (http://dmar.riken.jp/matrigica/) and the integrated analysis of meta-information (the elements, functional groups, additives, and physical properties) in polymer materials. Firstly, the 1H-static ssNMR data of 120 polymer materials were subjected to a short-time Fourier transform to obtain frequency, intensity, and T2 relaxation time for domains with different mobility. The average T2 relaxation time of each domain is 0.96 ms for Mobile, 0.55 ms for Intermediate (Mobile), 0.32 ms for Intermediate (Rigid), and 0.11 ms for Rigid. Secondly, the estimated domain proportions were integrated with meta-information such as elements, functional group and thermophysical properties and was analyzed using a self-organization map and market basket analysis. This proposed method can contribute to explore structure–property relationships of polymer materials with multiple domains.

www.nature.com/scientificreports/ to chemical shift anisotropy for high resolution. On the other hand, there are different complementary approaches to tackle complexity of polymer domain structure. 1 H-static ssNMR can be applied to quantify domain mobility in terms of dynamic heterogeneity 20 . In addition, the use of magic-and-polarization echo (MAPE) 21 and doublequantum (DQ) 22 filters can determine the spectral parameters for the mobile amorphous domains with the long-time decay and the strongly dipole-dipole-coupled crystalline domains with the quickly decay, respectively. In the case of characterization of a solid-state sample with domains of rigid, intermediate and mobile types, the 1 H-static ssNMR measurement is useful as a measure of the kinetic nature of higher order structures, although its analysis is difficult because the spectrum is broadened and overlapped 23 . Therefore, application of signal deconvolution is needed to characterize structure and property of the sample. Several methods for spectral separation 19 , fitting and numerical simulation 24 such as SIMPSON 25 , SPINEVOLUTION 26 , dmfit 27 , EASY-GOING deconvolution 28 , INFOS 29 , Fityk 30 , ssNake 31 , and a noise reduction method based on principal component analysis 32 have been developed. So far, in NMR data analysis, signal simulation and fitting have targeted only the frequency domain or the time domain. In our previous study, we proposed signal deconvolution methods that combines short-time Fourier transform (STFT; a time-frequency analytical method) and probabilistic sparse matrix factorization 33 , and non-negative tensor/matrix factorization 34 . In our method using STFT, by simulating the signal for both the frequency and time domain, it was possible to separate the signal related to the motility characteristics of the domain structure based on the indicators of chemical shift and T 2 relaxation time. The NMR signal can be calculated by functions such as Lorentzian 35 , Gaussian 36 , and Voigt 37 in the frequency domain, and by the T 2 relaxation equation 38 in the time domain. In addition, the difference in T 2 relaxation times can be adjusted by the Weibull coefficient 39 . Analysis of the relaxation time of a sample's free-induction decay (FID) provides important insights into the chemical composition, structure, and mobility of the sample 38,40 .
The polymer domain structure has a significant influence on their macroscopic properties 41 . Materials informatics, which is the emerging field, support analysis of relationships of structure and property from materials data sets 2,3,7,8,42 . NMR signal has potential for use as a descriptor having the structural features of the molecules contributing to their physical/chemical/biological properties 43 . In previous studies, a self-organization map (SOM) has been applied to tool wear monitoring 44 . Market basket analysis (MBA) has been applied to predict drug-drug interactions 45 . Bayesian optimization has been applied in real and virtual degradable experiments of bioplastics 46 . Generative topographic mapping regression (GTMR) has also been applied to the analysis of CP-MAS spectra to predict 13 C NMR spectrum of the material in its solid-state based on its thermophysical properties 34 . Machine learning methods have applied for various material studies such as cloud-point engineering of polymers 47 , prediction of drug-polymer amorphous solid dispersion miscibility and stability 48 , atomic/interatomic properties prediction 49 , solubility prediction 50 , descriptor selection for investigating physical properties of biopolymers in hairs 51 , classification of the membrane materials 52 , prediction of crystallization tendency 53 , prediction of density, glass transition temperature, melting temperature, and dielectric constants of polymer 9 , macromolecular modeling 54 .
In this study, we propose a materials informatics approach to explore the structure-property relationships of polymers that combines the polymer domain modeling and the integrated analysis of polymer materials metainformation. For polymer domain modeling, 1 H-static ssNMR spectral parameters obtained using STFT were utilized, including T 2 relaxation time, frequency, and intensity. The domain structure with different mobility in the polymeric material was estimated by fitting the physical indices such as T 2 relaxation time, frequency, and linewidth. In addition, using a SOM and MBA, the relationships between the estimated domain structure and the meta-information such as elements, functional group, and thermophysical property were explored.

Results and discussion
A materials informatics approach to exploring structure-property relationships using domain modeling. The conceptual diagram of materials informatics approach to exploring structure-property relationships using domain modeling is shown in the Fig. 1. The detailed analytical flow of this method is shown in Supporting Information Fig. S1. We have utilized the input polymer information that are 1 H-static ssNMR ( 1 H-static ssNMR) data, primary structure of the polymer, and thermophysical property data (TG/DTA/DTG). Then, frequency and time information are obtained by STFT against FIDs obtained by 1 H-static ssNMR. The domain modeling method is the following. The domain components are firstly separated by fitting the obtained frequency and time information (Fig. S2). Secondly, the domain component ratio is calculated by 3D modeling (Fig. S3). To reduce the error between total of the simulated values of domain components using our signal processing method and the original static data, Bayesian optimization as one of the optimization methods was performed using Eq. 1 (see in the Materials Methods section) in the text to search for the ratio between the mobile and rigid components according to Eqs. (2)(3)(4)(5). After that, we performed statistical analysis, MBA, and SOM, which are materials informatics methods, on the obtained domain information, primary structure information, and thermophysical property information to associate the structure and physical properties. The detailed results are shown in the following sections. www.nature.com/scientificreports/ into four domain components: Mobile, Intermediate (Mobile), Intermediate (Rigid), and Rigid domain components, those we regard to mobile, slightly mobile, slightly rigid, and rigid components in their material states, respectively. As a result, the domain component ratios indicated differences among not only different polymer materials but also similar ones that composed of the same monomers, which can be attributed to the molding conditions and molecular weight. In the case of PCL (Fig. 2 (Fig. 3). The box-and-whisker plot shows that the average T 2 relaxation time information is 0.96 ms for Mobile, 0.55 ms for Intermediate (Mobile), 0.32 ms for Intermediate (Rigid), and 0.11 ms for Rigid. The weighted average (WA) of the polymer materials showed a distribution among the samples, among which PCL (Fig. 3, red squares) was high and PHBH (Fig. 3, yellow circles) was low. The same was true for the results of thermal analysis spectral data (Table S3). Based on the domain component ratios, the estimated domain ratio diagrams were inserted for the highest PCL and lowest PHBH of WA.

Results
Self-organizing map analysis integrating domain proportions and quantitative spectral data in polymer materials. In order to evaluate the relationships between domain structure and thermophysical properties of polymer materials, we integrated the domain proportion described in the previous section, 13 C-CP/ MAS spectra 55 , which easily reflect primary chemical structure, and quantitative thermophysical data (thermogravimetry (TG), differential thermal analysis (DTA), derivative thermogravimetry (DTG), differential scanning calorimetry (DSC)) 56 . To capture the characteristics of the integrated data, clustering by SOM was performed (Fig. 4). For input data of SOM, the domain component information calculated by the component separation method and meta-information of thermophysical properties, elements, and functional groups used is listed in Tables S1 and S2. These materials clustered in the following way i.e., navy blue circle symbols are polyethylene terephthalate (PET), light blue circle symbols are polyethylene (PE), and the clusters of these polymer materials are on the top, blue triangle symbols are poly(butylene adipate-co-terephthalate) (PBAT), light pink circle  Table S3).

Market basket analysis integrating quantitative domain proportion and qualitative meta-information in polymer materials.
The MBA was performed to evaluate the relationships between the quantitative domain proportion as well as qualitative meta-information such as elements (linearly connected methylenes > 4 carbons), functional groups (aromaticity), and thermophysical properties. For input data of MBA, the domain component information calculated by the component separation method and meta-information of thermophysical properties, elements, and functional groups used is listed in Tables S1, S2 and S3. Using the transaction data based on the MBA, a network diagram is shown in Fig. 5, where the T 2 information of the four domain components (Mobile, Intermediate (Mobile), Intermediate (Rigid), and Rigid) shows a high lift value with the primary structure information (aromaticity, linearly connected methylenes > 4 carbons). While the thermophysical properties (melting temperature, T m ; thermal decomposition temperature, T d ; glass transition temperature, T g ) show a dominant lift value with their structure information. The lift value here is one of the indicators for correlation analysis in MBA. Figure S5 shows the MBA network diagram using the temperature information from thermal analysis, where the Mobile domain ratio correlated with the lower temperature  . The rigid domain ratio correlated with the lower temperature (< 100 °C) and of the higher temperature (> 300 °C) thermophysical properties. In common, PCL is a thermoplastic biodegradable polyester with good thermal processability and low melting point 57 . In PCL, there is a melting temperature at 66.5 °C of DTG (Fig. S4, Table S3). While the DTG peak of lower temperature is correlated with the Rigid domain (Fig. S5a).

Conclusion
In the development of materials, in addition to the chemical structure from the primary to the higher-order, meta-information such as molding process and additives are important factors because these have great influences on the final material properties. We have developed a materials informatics approach that combines the domain modeling and the integrated analysis of materials meta-information. To estimate the domain structure information, we have introduced a time-frequency simulation method for calculating multiple domain components from 1 H-static ssNMR spectra. In our integrated analysis of domain proportions and meta-information, SOM was a useful tool for capturing trends across polymer material data. On the other hand, MBA was able to investigate the strong relationships between structure and meta-information in individual materials, including qualitative data as well as quantitative data. The relationships between mobility of domain structure and melting temperature were similar to the results shown by SOM and MBA (Figs. 3, 4, 5) 57,59 . This materials informatics approach is expected to efficiently explore relationships between structure and properties of high-performance and low environmental impact polymer materials.

Materials and methods
Materials. Polymer materials (Table S1) were prepared using a press molding machine (H300-01, AS ONE Corp., Osaka, Japan) and molding methods reported in a previous study 46 .
Time-frequency simulation of 1 H-static ssNMR spectra. The time-frequency simulation method was developed in Python 3, by using the packages of nmrglue 60 for processing of NMR data, Scipy.signal for the Fourier transform, STFT, and mathematical processing, the curve-fit function of scipy.optimize and Bayesi-anOptimization for the fitting process, and mpl_toolkits for visualization of 3D (time, frequency, and intensity) simulation model. Before applying this method, the FID data was phase-corrected, baseline-corrected, and inverse Fourier transformed using TopSpin (Bruker-BioSpin, MA, USA). In order to calculate the ratio between the Mobile and Intermediate (Mobile) domain components obtained from the MAPE filtered spectra, and the Rigid and Intermediate (Rigid) domain components obtained from the DQ filtered spectra, the calculation errors between the four domain components and the STFT 1 H-static spectra (Static) were calculated using the following equation (Eq. 1).
After finding the α and β parameters that minimize the error, we created a 3D model of the four domain components based on the frequency and T 2 relaxation time information.
The domain component ratios contained in the polymer material were calculated using the following equation (Eq. 2-5). www.nature.com/scientificreports/ A detailed description of the time-frequency simulation method is given in Supporting Information Figs. S1 and S2.
The weighted average (WA) of the T 2 relaxation times for a single polymer material was calculated using the following equation (Eq. 6).

Self-organizing maps of domain proportions and thermophysical data in polymer materials.
In the integrated analysis of domain proportions and meta-information of polymer materials, a SOM was produced using the R package kohonen 61 . In order to evaluate the relationships between domain structure and thermophysical properties of polymer materials, we integrated the domain component information calculated by the component separation method, 13 C-CP/MAS spectra, which easily reflect primary chemical structure, and quantitative thermophysical data (TG, DTA, DTG, DSC). To capture the characteristics of the integrated data, clustering by SOM was performed (Tables S1, S2). A basic explanation of SOM is presented in the SOM section of the Supporting Information.

Market basket analysis of domain proportions and meta-information in polymer materials.
MBA was performed using the R package arules 62,63 . Linearly connected methylenes > 4 carbons, oxygen containing and aromaticity was set to 1 if true and 0 if false. The numeric data of domain component information calculated by the component separation method and meta-information of thermophysical properties, elements, and functional groups listed in Tables S1, S2 and S3, were converted to "high" and "low" ranked data. The "high" or "low" ranked data were defined as the top 25% or the bottom 25% of all values. Association rules were determined using criterion values of support, confidence, and lift. Since lift values < 1 do not independent relationship as association rules, this study adopted a cutoff value of 1 as a lift value threshold for association rules. In addition to this, the probabilities of random occurrences are 6.25% for support and 25% for confidence because each variable was ranked by using the top or bottom 25% of all values. The maxlen (maximum size of mined frequent item sets) were set to 2. A basic explanation of MBA is presented in the MBA section of the Supporting Information. The association network was visualized using the Cytoscape program.
Tool development for automated spectral simulation. We have created a Bayesian optimizationbased 50 spectral simulation tool that automates the T 2 relaxation time domain fitting, frequency fitting, and 3D domain modeling of our domain component separation method. The details of the Python program including T 2 relaxation time information, frequency information, and 3D domain modeling of the present domain component separation method can be obtained at https:// github. com/ riken-emar/ matri gica. For improving the level of accuracy of the prediction, intermediate regression models were employed when performing in-phase machine learning. In addition, we developed a website dedicated to the established domain component ratio calculation, which is freely available at http:// dmar. riken. jp/ matri gica/.

Data availability
The analytical tool and numerical data for association analysis used in this study were deposited on the following websites: http:// dmar. riken. jp/ NMRin forma tics/ MatRi giCa. zip and http:// dmar. riken. jp/ NMRin forma tics/ Datas etFor MatRi giCa. zip.  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.